智能论文笔记

Forecasting Future World Events with Neural Networks

Andy Zou , Tristan Xiao , Ryan Jia , Joe Kwon , Mantas Mazeika , Richard Li , Dawn Song , Jacob Steinhardt , Owain Evans , Dan Hendrycks

分类：机器学习 | 自然语言处理

2022-06-30

预测未来的世界事件是一项具有挑战性但有价值的任务。对气候，地缘政治冲突，大流行和经济指标的预测有助于塑造政策和决策。在这些领域中，专家人类的判断有助于最佳预测。鉴于语言建模的进步，这些预测可以自动化吗？为此，我们介绍了AutoCast，这是一个包含数千个预测问题和随附的新闻语料库的数据集。问题来自预测锦标赛，确保高质量，现实世界中的重要性和多样性。新闻语料库是按日期组织的，使我们能够精确模拟人类过去的预测（避免将来泄漏）的条件。我们的动机是由于数量级的预测数字的难度（例如，2022年的Covid-19的全球案例），我们还策划了Intervalqa，这是数值问题和校准的数值问题和指标的数据集。我们在预测任务上测试语言模型，并发现绩效远低于人类专家基线。但是，随着新闻语料库中相关信息的合并，绩效提高了绩效。总而言之，AutoCast对大型语言模型提出了一个新颖的挑战，并提高了性能可能会带来很大的实际收益。

translated by 谷歌翻译

How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection

Mantas Mazeika , Bo Li , David Forsyth

分类：机器学习

2022-06-28

模型窃取攻击带来了公共机器学习API的困境。为了保护金融投资，公司可能被迫拒绝有关其模型的重要信息，这些信息可能有助于盗窃，包括不确定性估计和预测解释。这种妥协不仅对用户有害，而且对外部透明度也有害。模型窃取防御措施试图通过使模型更难窃取，同时为良性用户保存实用程序，以解决这一难题。但是，现有的防御能力在实践中的性能较差，要么需要巨大的计算开销或严重的公用事业权衡。为了应对这些挑战，我们提出了一种新的方法来模拟窃取梯度重定向的防御措施。我们方法的核心是一种可证明的最佳，有效的算法，用于以目标方式指导对手的培训更新。结合对替代网络的改进和一种新颖的协调防御策略的改进，我们的梯度重定向防御，称为Grad $ {}^2 $，实现了小型公用事业的权衡和低计算机开销，表现出色的先前防御能力。此外，我们证明了梯度重定向如何以任意行为来重新编程对手，我们希望这能促进新的防御途径。

translated by 谷歌翻译

X-Risk Analysis for AI Research

Dan Hendrycks , Mantas Mazeika

分类：人工智能

2022-06-13

人工智能（AI）有可能极大地改善社会，但是与任何强大的技术一样，它的风险和责任也增加。当前的AI研究缺乏有关如何管理AI系统（包括投机性长期风险）的长尾风险的系统讨论。请记住，AI可能是提高人类的长期潜力不可或缺的一部分，人们担心建立更聪明，更强大的AI系统最终可能会导致比我们更强大的系统。有人说这就像玩火，并推测这可能会造成生存风险（X风险）。为了增加这些讨论，我们回顾了来自危害分析和系统安全的时间测试概念的集合，这些概念旨在将大型流程引导到更安全的方向上。然后，我们讨论AI研究人员如何对AI系统的安全产生长期影响。最后，我们讨论如何稳健地塑造将影响安全和一般能力之间平衡的过程。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

Dan Hendrycks , Andy Zou , Mantas Mazeika , Leonard Tang , Dawn Song , Jacob Steinhardt

分类：机器学习 | 计算机视觉

2021-12-09

在真实世界的机器学习应用中，可靠和安全的系统必须考虑超出标准测试设置精度的性能测量。这些其他目标包括分销（OOD）鲁棒性，预测一致性，对敌人的抵御能力，校准的不确定性估计，以及检测异常投入的能力。然而，提高这些目标的绩效通常是一种平衡行为，即今天的方法无法在不牺牲其他安全轴上的性能的情况下实现。例如，对抗性培训改善了对抗性鲁棒性，但急剧降低了其他分类器性能度量。同样，强大的数据增强和正则化技术往往提高鲁棒性，但损害异常检测，提出了对所有现有安全措施的帕累托改进是可能的。为满足这一挑战，我们设计了利用诸如分数形的图片的自然结构复杂性设计新的数据增强策略，这优于众多基线，靠近帕累托 - 最佳，并圆形提高安全措施。

translated by 谷歌翻译

Measuring Coding Challenge Competence With APPS

Dan Hendrycks , Steven Basart , Saurav Kadavath , Mantas Mazeika , Akul Arora , Ethan Guo , Collin Burns , Samir Puranik , Horace He , Dawn Song

分类：自然语言处理 | 机器学习

2021-05-20

虽然编程是现代社会中最广泛适用的技能之一，但现代机器学习模型仍然无法对基本问题的解决方案。尽管重要的是，对评估代码生成令人惊讶的是，很少有效，并且难以准确地评估代码生成性能。为了满足这一挑战，我们介绍了一个用于代码生成的基准。与在更受限制的设置中的事先工作不同，我们的基准测试衡量模型采取任意自然语言规范的能力，并生成满意的Python代码。类似于公司如何评估候选软件开发人员，然后我们通过检查测试用例的生成代码来评估模型。我们的基准测试包括10,000个问题，从具有简单的单线解决方案来实现实质性算法挑战。我们在GitHub和我们的培训集上微调大型语言模型，我们发现语法错误的普遍性随着模型的提高而导致呈指数级递减。最近的模型如GPT-Neo可以通过大约20％的介绍性问题的测试用例，因此我们发现机器学习模型现在开始学习如何代码。随着自动代码生成的社会意义在未来几年增加，我们的基准可以提供跟踪进步的重要措施。

translated by 谷歌翻译

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Dan Hendrycks , Mantas Mazeika , Saurav Kadavath , Dawn Song

分类：

2019-06-28

Self-supervision provides effective representations for downstream tasks without requiring labels. However, existing approaches lag behind fully supervised training and are often not thought beneficial beyond obviating or reducing the need for annotations. We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions. Additionally, self-supervision greatly benefits out-of-distribution detection on difficult, near-distribution outliers, so much so that it exceeds the performance of fully supervised methods. These results demonstrate the promise of self-supervision for improving robustness and uncertainty estimation and establish these tasks as new axes of evaluation for future self-supervised learning research.

translated by 谷歌翻译

Using Pre-Training Can Improve Model Robustness and Uncertainty

Dan Hendrycks , Kimin Lee , Mantas Mazeika

分类：

2019-01-28

He et al. (2018) have called into question the utility of pre-training by showing that training from scratch can often yield similar performance to pre-training. We show that although pre-training may not improve performance on traditional classification metrics, it improves model robustness and uncertainty estimates. Through extensive experiments on adversarial examples, label corruption, class imbalance, out-of-distribution detection, and confidence calibration, we demonstrate large gains from pre-training and complementary effects with task-specific methods. We introduce adversarial pre-training and show approximately a 10% absolute improvement over the previous state-of-the-art in adversarial robustness. In some cases, using pre-training without task-specific methods also surpasses the state-of-the-art, highlighting the need for pretraining when evaluating future methods on robustness and uncertainty tasks.

translated by 谷歌翻译

Deep Anomaly Detection with Outlier Exposure

Dan Hendrycks , Mantas Mazeika , Thomas Dietterich

分类：

2018-12-11

It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small-and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

translated by 谷歌翻译